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The difficulties in evaluating fcilingual education 
appear to have prevented success in ail bijt a few evaluation 
attempts^ but better and laore aeaningful evaluation is neccessary in 
order to identify the strengths* and weaknesses cf bilingual prograas. 
Hany biling^ial programs are undergoing constant jBodif icaticn^ ' 
adequate ass&ssaent instruQents have been lacking^ and fclitical 
pressures have interfered with evaluation picgraes^ Since it i^ har:d 
to find individuals who are expert in both evaluation and bilingual 
education^ evaluation teass are recomiBended: rather than individual 
evaluators. Control groups are not nor tally av.ailable^ so 
quasi-experiaehtal designs must be used. Ihe author recomfiends the 
Before-and-After and particularly "the Time Series designs. Several 
precautions are listed. Independent variables for an evaluation are 
listed and categorized as contextual variables, student variables, 
and treatment variables. Dependent variables aost reflect the goals 
jdx— o453eetrives of the bilingual prograa and* aust be valid. Good 
comprehensive docusentation is also essential, particularly in 
longitudinal research. (CTH) 
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vintroduction 



Bilingual education is enjoying its first decade of prominence in the 
United States, In* 1963, Dade County In Florida started a, public school 
Spanish-English ^bilingual program for Cuban Americans and Anglos. In 1967, 
^the B^ilingual Education Act was added to the Elementary .and Secondary Educa- 
tion Act of 1965, Federally-funded Title VII bilingual education programs 
began in 1968, More recently, states have passed leflislation/to .fund bilin- 
gual projgrams (Swansop, 1974; U.S. Commission, on Civil Rights, 1975). 

At a time when the implementation of bilingual programs has reaqhed such 
a peak, the evaluation of programs lags farrbehind. Despite millions spent on 
the development of programs, the United States experience to da:te has yfelded 
few meaningful insights ioto various aspects of program design (Troike, 1974;. 
U.S. Commissi' on. on Civil Rights, 1975; Ramirez et al., 1975). Reasons for thi 
lack of hard data inplude the following: , 

(1) It is hard, if not impossible, to obtain meaningful research 
results from pilot programs that are constantly undergoing 
modification, presumably for the better. Even if summative 
results are obtained, the researcher is hard-put to give a- 
label to a particular treatment, since it is in such a state 
of flux. 

" i . . 

(2) There has been such- a pressing need for formative evaluation . 
^ of project-oriented goals specifically behavioral objectives 

contained in the curriculum, that no time has remained for- ' 
evaluating other things. 

(3) Until recently there have not been adequate assessment instruments 
particularly bilingual ones, and even now much test development 
and norming are called for. 

(4) Political threats to bilingual schooling have almost forced 
evaluation reports to be public relations documents. 

(5) Evaluators Iiave tended to be persons unfamiliar with: the 
.particular needs and characteristics of bilingual education. 

The "fledgling pro.gram" reason should no longer apply, since bilingual 
projects nationwide now have more stability, as a result of a gradually 
growing accumulation of experience, methods, and material. But if bilingual 
education is to continue to advance, better and more meaningful evaluation 
is necessary.. 

•With respect to a project-centered instructional objectives, more than 
ever before there is a need to-entertain the larger questions as well. 
Tucker and d'Anglej^an (1971) question whether "self centered" project goals 
such as meeting specific teaching objectives, are valid criteria for evalua- 
ting the success or failure of a program (e.g,, 75% of the children can 
answer 90% of the questions in a certain section of a book). Whether or 
not such criteria are valid, there is more to formative evaluation, such as 
Investigation of the following areas (adapted from Saville and Troike, 1971): 



(1) The teaching techniques that prove most successful. in* different 
situations (grouping,- sequencing and pacing of materials, and 
correction procedures).. • . 

■ :(2) The effect of program design (e.g., partial or full bilingual . 
schooling using a concurrent, dual' language, or alternate days 
' approach to instruction). 

(3) The effect of teacher training and jjatterns of staff utilization. 

The lack of adequate instruments is ^till a problem, though not as severe' 
as ig years ago when federally-funded program evaluations were initiated. Yet 
evaluation must proceed even if the most appropriate instrument is not avail- 
able. . Recently even some widely used standardized instruments, such as the 
Peabody Picture Vocabulary Test and the Cooperative Primary Test of Reading, 
have been subject to criticism (Cicourel et al . , 1974). We seem to be entering 
an era in which ethnomethodological scrutiny of tests and of individual, test 
-items will be common practice. - - 

Mpinally, it would appear that bilingual schooling is here to stay, at 
least for the foreseeable future. Thus, evaluation reports should reflect 
more than a morass of tabular data and a scattering of carefully selected 
and-t^ntatively--or even ambiguously,--worded findings. Instead, the findings 
should -reflect strengths as well, as weaknesses, and even more important, should 
be structured in a way to provide comparable data across programs, and should 
be designed so as to provide feedback to-aid in the ongoing improvement of 
program practices. It is regrettable that the tendency to avoid measures which 
might produce negative results has all but precluded the possibility of learn- 
^ing from project deficiencies (Berman and McLaughlin, 1974). 

Given, the current developments^.in the field br research, there appear to 
be few obstacles to conducting sound, rigorous evaluation of bilingual programs 
Such evaluation would reflect the following elemshts: 

(1) Careful collection of meaningful baseline data from selected 
subjects. . . 

C2) The identification and development of instruments to measure 
key variables. ." 

(3) The identification ;0f. treatment characteristics and documen- 
tation on the implementation and development of the program, 

(4) The establishment of longitudinal ity. 

(5) The interpretation of results in implementable terms that are 
meaningful to teachers, policy makers, and researchers. 

Whereas the research literature on bilingual schooling. is generally 
lacking rigorious longitudinal evaluations, several such investigations have 
been conducted in Redwood City (Cohen, 1975) in Culver City (Cohen, 1974), 
in Montreal (Bruck et. al . , 1975), and in Illinois Cohen and Rodrfguez- 
Brpwn, 1977). ^ 
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Rationale 



Although several attempts have been made to develop comprehensive 
. evaluation studies which will shed some light on the state of affairs of 
bilingual education programs (Cohen, 1974), most of the studies done, up 
to now, contain severe problems in the areas of design, usability and 
applicability of dependent and independent variables, data management, 
program documentation and interpretation of findings, (i.e. AIR report 
1976, Chicago Board of Education, 1976). 

It is important then, to review- some crf the most conrnon problems found 
in evaluation studies of bilingual education and to describe what could be 
a realistic design for evaluation of bilingual programs. Realistic in the 
sense that the author (knowledgeable of the' nature and actual functioning 
of bilingual programs), will try to accomodate all the intrincacies into 
his/her planning and come up with a design that is valid for the situation 
although not the strongest from the research point of view. The purpose 
of the paper is then, to pinpoint some of the issues usually ovv^rlooked by 
people evaluating bilingual- education, and to recommend alternative ways to 
collect, look at, and interpret data. 
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The Role of Evaluation in the Implementatiori^of Educational Programs 

The skepticism with which most people^ (administrators, teachers, etc*) 
involved in programs that require annual evaluation reports^ (i .e. Title I, 
Title VI!) see the role of the evaluator is not unknown to people working 
on evaluation studies^* There is, in general, a misunderstanding as to the 
role of the evaluator and/or the evaluation studies which need to be clari- 
fied^ 

The role of the evaluator is mainly that of describing the data he/she 
collects from the program and interpreting those results. The eval"tfator is 
not interested in individual teachers and how their students perform. He/ 
she^ is not interested in looking at individual student scores. His role is 
mainly to explain, from ^-he data at hand, what the status of the program is. 
This implies that the evaluator should pinpoint strengths and weaknesses of 
the program. He/she may directly recommend some changes in the structure 
and/or design of the program or he/she may recommend" that an experf in the 
discipline involved in the program review the program and change it accprd- 
irigly. The function of evaluation is to encourage the implementation of 
better programs. 



' Of course, the author is consc\ious of the misuse of misinterpretation 
of evaluation data. This misuse, tiough, very seldom comes from the evalu- 
ator himself. Most of the time the data is misinterpreted when it gets 
into the hands of the administrators, teachers and the general public. It 
is therefore relevant to state that4he data is misinterpreted not because 
the public is against a program (in most cases), but because ihe public 
misunderstands the role of evaluation as it is specified above. 

After this brief clarification as to the role of the evaluator and the 
purpose of evaluation studies, a look at the different components involved 
O in the evaluation studies and specifically, bilingual education evaluation 
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studies, will be covered in the following sections 'of this paper. ^ 



The Evaluator or Evaluation Team 

* . ♦ • " * - 

One. of the problems found in bilingual education evaluation projects 
is the fact tf\at the people in charge of the evaluations .are either experts 
in research and evaluation or experts in bilinguaV Gducation^ It is desir- 
able to*have a team of_ people involv.ed jn/evaluation projects; this way, it 
is possible .tOL IncTude In it peop]e knowledgeable in research and evaluation 
as well as specialists in bilingual education. 

» 

When the eyaluators* expertise is in research and evaluation, they may 
be not knowledgeable of the intricacies and particularities of the programs 
studied. The evaluator, as an example, may decide to evaluate bilinguaV 
programs by using a strong design that requires a control group. This. design 
supposedly would make the study more valid statistically, but it is almost 
impossible to find a good tontrol group composed of children from the same 
linguistic, cultural, and socioeconomic background who are not participating 
in bilingual programs. Most states that have a large 'population of linguistic 
and culturally different children mandate bilingual education and, furthermore, 
all children who^^'need bilingual education are supposed to participate in the 
program. 

On the other hand, if the evaluation team was formed only by- people whose 
expertise is in bilingual education the evaluation may lack a design at all. 
What happens in this situation is that tests are administered in great quan-" 
titles. They may be scored but, then, no one knows what to do* with the data. 
A lot .of data mismanagement occurs in, this situation. Since there is not a 
plan for the evaluation, there is no documentation as to how the program was 
implemented through the year, including such aspects as any environmental 
changes, personnel changes, etc., which may have affected the results of the 
testing. A lot of times, means and standard deviations are calculated from 
the data but no interpretation of results is given. 

There is a need then, to have an evaluation team where both, people 
specializing in research and evaluation and people in bili.ngual education, 
work together and compromise on what may be a* weak design in terms of 
research and evaluation, but realistic and feasible in terms of the current 
issues involved in bilingual education* 

Experimental Design and the Evaluation of Bilingual Programs 

In the case of bilingual education programs,^ specifically the programs o 
in those states where, bilingual education is mandated, it is almost impossible 
to talk of a rigorous experimental control random assignment design. 

First of all, since most state laws require that schools with a determined 
number of children of the same non-English or limited English speaking back- 
grounds have a bilingual program, it would be very hard to find a comparison 
group formed by children culturally arid linguistically similar who do not ^ 
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participate in a bilingual program. 



With this In mind, the eY^luator' has %6 look for a design capable of 
.impact, or lack of^it. on a program (or treatment) without 
ontrol ,aroun. Real ■?c<--?o:>n „ 4.-i,'_-, wiuiuui. 



,....r---» *" luuN ui'-ic. un d program (or treatment) without 

ISf4i'frt;?c'^H"f -"^ ^^'^^ .(Center^for the Study of Evaluation. 
a974T.In tnis aesign only the. experimental (in our case the 
bilingual program) group is measured. The findi/igs. though, 
only relate as to the way one programs works. 



(a) 



(b) 
(c) 



lime Series Design (Center for the Study of Evaluation. 1974) 



^? alternative will be to use participants as their own 

wMf J? ^'^^'''^^ determine an expected 

ITI JI ^"J.!";'?"* described -by objectives. This alternative 
. dL"?L^?*'*''*'^?I ^"d.-^^i' require that the district 
develop Its own criterion-referenced testing system. 

a oJtJtf^ Jhf' °^ the Before-and-After design, one group of students' takes 
fn'^rr ?M J *^^"^^e group gets, the treatment (in this case, the Darticioation 
tSp 5?n.^'^'"^^'l '"^ afterwards, the same group takes a^^Ssl lesr " 

IS the cLeTf b1l?nS..1 ^^''^ *e't^ were .Smed'ones. 

e.t ? bilingual programs it is recomnended that the districts o- 
the states develop their own norms. tne^aistricts or 

folloSi' represented, using Campbell and Stanley's signs as 





=rr«^— , 

Time 


E - group 


(pre) X (post) 
0 0 



Where: 



E - group 
0 

\^ 

pre 
post 
X 



experimental 

measurement or observation of 
some kind 
pretest 
post test- 

treatment which should be very 
well documented in thi^ type 
of design. 




if tJ"^.°^J^^ problems with this design is that it is very hard to exolain 

'^V^ """^'"^y *° ^^^Q^^-" °^ some other fac?or if is 
impossible to determine what the results would have been withn.it tho LI 

L'?S«to''''"' °' '"^*9n is that by havi-n7to'%'oll'ow"on J o g ou hT* 
^nJ^Jl-^J'" ^^'""^ documenting the program including mften'als 

and activities and their relation to 'the objectives to be attained 

iect ?LJp/'if Jh"^ scores.the evaluator could report for euch test and sub- 
ject tested; 1) the number of students per school (it should include only 

7 
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students that were pre- and post tested) i 2) the mean score for the ^pre- 
test, 3) the mean score for the post test and 4) t-test results The 
' t-test tests. the significance of the difference between the pre- and jliost- 
test scores, • 

children perform better as they grow older, so you will probably 
4 "J«^n?I-^^'f * differences between pre- and post test results in ar?as such 
as cqgnitive development and achievement. If no significant difference is 
.found, the evaluator should look into the treatment documentation data to 
make assumptions On this regard. This way> it may -be possible to explain the 
happening either as a tespng error or a real non-progress sit\iation. ' 

meaJ!!p.^''^In^Src''^^cf^-!"^-^?V^?"^*'<'^y comparing them to standardized 
Thp^nnni*, J ^^'^ l^-^l'.V ^^^^ Important for the evaluator to describe 
the population on which the test was normed and compare both groups This 
is to make sure the .groups are similar not only culturally and linguistically 
but contemporarilly and socially. . \ ""y^'sticauy 

H.rJ^c/f for this design should not be based on^y'^on stan- 

•nnm f results (if you have used this type of test and ha^/e valid 

norms) Tjie report should include 1) all the documentation of the program. 
2) statistical report of pre- and post test data and t-test, 3) a report o^ 
tSTJ'tr collected, and. o.f course. 4) some interpretation 

could be improved" ^^^^ and recommendations as to how the program 

One of .the suggestions given to evaluators who have to use this desiqn 
is to develop some sort of measurement by objectives evaluatiorv' (ideally a 
' criterion-referenced system). This way it is possible to. report on the pro- 
^ T 5 ftrong and- weak areas according to the performance of students in • 
c'hoose V.ll'%T recommended. t6o. that the eval Stor 

choose tests that are sensitive to- the grades of the programs being eval- 

In the Time Series (design with an experimental group only, the experimen- 
tal group IS tested several times before and after the treatLt ^ilinS ' 
program) starts and at. specific intervals of time. . ^«^™em: ^di ungual 

symbols-*"^'^ ^ diagram for this design using Campbell and Stanley's 
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EXPERIMENTAL' GROUP 


0000 X 0-000 



Where: 0 = measurements 

X = treatment or program 
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The main steps in the implementation of this design can /be described as: 

1. Choose tests or measures, reliable, valid and proper to use with the 
population you are working with, which can be ,use5 repeatedly 

• < '/ 
'p 8 / ' ^396 6i 



2. Choose the composition of the experimental group to be tested (i,e, 
the same group tested several times, randomly selected groups each 

• time or successive group^; of students.) 

3. Make sure to^coll ect- at lea^st 3 measurements at regular intervals 
before the program X_- starts, 

4. Che.ck and document, the ijnplementation of the program. ^ 

I 5. Collect measurements at the same regular inte^rvals as before the 
'sf , program. 

; In relation to- the composition of the experimental, "group to be tested, 
if the same group is tested all' the» tijjie or if randomly, selected samples from 
the experimenta,! group are selected, the design can be called a longitudinal 
time series design . If successive groups of students who supposedly . • • 
represent the gpup are tested each ti.me, the design is called a successive 
groups , time >series. The nomenclature used is the one' given by the Center 
for the Study of Evaluation - UCLA, 1974. 

The following are several aspects' that the eValuator using t'his design ' 
should take into account when describing in relation to the pfog'ram imfJlemen- 

/ 



tation. 



1. It IS important to specify if the program was implemented and when. 
Implementation data including' exact dates should be used in documen- 
ting the program. 

2. Make sure that documentation information includes'any happennings 
occurring during .time of implementation (i.e. a new teacher came 
in at the same time the .program was being implemented). To make 
any statements related to effect of the program, the influence of 
other aspects such as bringing in a new teacher should be minimized 
because a correct explanation of the effect of the program would 
then be impossible. . ^ \ 



3. Changes in the method of collecting data should be-^ documented and 

explained. Were the same tests or linstrumei;its used all the time? 

If they were different, how different were they?' Is there anv 

^ way to make the scores comparable? 
I * ^ I * t* 

4. Explain any changes in the comjjositiori of, the experimental group. 
Is the group the same as when the- program started? If different, 

^how different? If you think the group has changed a lot, it would 
be helpful to lopk at another set of list scores.. For exanjple, 
if the program treatment received by the experimental group is in 
reading, you shpuld collect data on math. Later, if there is any 
question as to whether the nature of the experimental group has 
changed and produced an effect in scores (i.e\ brighter children ' 
came into the program)', the math as well as a reading .tests results 
could be compared. If the reading ifores go up -significantly tut 
the math scores do not, it is possible to' say that the higher score 
in reading is due to the program. 

5. Check to see i'f the results show a cyclical pattern. If this is. 
so, then the results are not due to the program/ For example, it 



may be that results peak at some point during the year. This can 
i be checked by looking at scores from previous years. 



Even when a lot of different/dafa is collected to document the tmplem- 
tation of the program, there exists the possibility that the results from 
the evaluation are not duetto the program. Since this', design is not very 
strong, it is. called a "quasi-experirrtental design" (Campbell and Stanley, 
1963 

It is the view of the author ,^ that due- to the problems involved in 
finding a control group to evaluate bilingual programs, the next best 
design to be used is the Time Series^Design . It is <recorranended that 
either the same group or a ^randomly selected sample from the experimental 
group be tested each time so that it is possible to attain some longitudin- 
al ity with the data collected. ^ . 



* - What are- ttie go^4^or objectives of the school 
district or educationa] program? 

- What controls are needed to .provide meaningful 
comparisons among programs? 

From answers to the latW ^question, the set of independent variables 
can be developed. With thfs idea jn mind, a series of independent Variables 
is defined that could be relevant to a longitudinal study of bilingual school- 
ing. V These variables are loosely defined and only attempt to touch on factors 
that should be considered for an evaluation design. The; importance of each 
factor as a source of variation that must be controlled, must be made in 
light of the specific evaluation settings. 

Thes'e variables have been divided into three groups or categori ^iG, 



A list of these variables under each category is .given below. • 



I. Contextual Variables . 

A* * School district characteristics 

1. size 

2. resources 

3. ethnic composition 

4. degree of integration 
" "5. SES composition 




K Contextual Variables 



II. Student Variables 



III. Treatment Variables 



Con^oity characteri^stigs ' « 

!• di^nslty . ;\ ' ' 

2. SES xomposit'ion 

3, _^degree'of integration'* * , , 
^4." occupationa]> make-up ^ ' 

5. political invalvem.ent including 
involvement jin school district 

, 6. educational attitudes • 

Parent characteristics " ^ ... 

^ 1. schooling - - ' 

'2, occupation 

3. ethnicity 

4* attitudes ' _ . ' 

5* involvement ' ^ ' : 

6. SES status ^ . ^' . 

7. dominant language 

„ 8. children-home life 



Student Variables ^ • . ' , 

A, Physical characteristics 
L sex 

2, size " \ 

3. health ([physical handicaps) 
4^ age ' " 



B. Education ' " ' , 

) ' - ' . 

1, • level of schooling ' . 

2* years of^schooling 
3, schooling characteristics 

a. gra[des (mafeks) ' . 

b. continuity 

c. 'special program (other bilingual programs). 

4i attitude toward school , and^ education 

5/ dominant language - 

a, writing 

b, raading / 

c, speaking 

I ' d, lis'tening ^ 

6, achievement 



a% home language context' 
b. 'English context 
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C. Peer relat;ions ' ' ^ . 

D. Language association 

1. yenrs in U.S. * . . 

2. age of first association with English 

3. duration and tinie histpry with association 

4. Intensity of association 



HI. Treatnent Variables 

" Setting * ^ 

1. school characteristics ^ ' 
, 2. classroom .characteri sties . 

3. other programs employed , . * ^ : . 

• . ' Program characteristics ^ 

1. size 

2. staffing characteri.stics 

3. * personnel relations 

4. selection criteria 

5. curriculum 

a. design 

b. organization 

c. role of culture ' 

• « • 

6. ' materials 

I • • 7. language ullage . 

''a. allocatilon to s^ujects 

, " b. ammount 

c. method 

d. peer usagp . \ \ * 

e. student teacher usage 

Some people have suggested or even used p rogram models as an independent 
variable (Board of Education, City of. Chicago i 19/6). It is the autWs 
personal view that program model s and instructional structures as they are 
implemented now are very Iposely d^ifined. 

Even if you are comparing two bilingual programs described as half-day 
programs, the two may be structured very differently and may^ not be providing 
^ an equal treatment. It is for this reason, that comparisons between different 
program models are not accurate and present a lot of measurement and inte»*pre- 
tation problems. As one can see, ^ list of independent variables is lengthy 
to begin with, and this list is by no means complete. " 

The answers to the question concerning the goals and obje^ctives of the • 
program should define the dependent variables to be measured. With respect to 
dependent variables, we try. here tn cover general areas of^oncern. As of now., 
there are not many good tests available for the jeasurernefit of common bilingual 
nrnn objectives; We feel that any test chosen should measure the objectives of the 



program and should not be used only because of its availability. This may 
the development of tests designed specifically for a given bilingual program. 

Some usual measures that reflect bilingual objectives are: 

I. Achievement 

A. Reading 

1. home languagG 

2. English 

B. Academic and cognitive ability ^ 

* 1. in home language 
2. in English 

C. Math 

D. Science 

E. Social studies 

F. Language Ability : 

1. listening 

reading 
3* writing 
4. speaking 

II. Language Dominance and Parity 
III. Affective Development 

A. Self-esteem 

B. Self -concept 

C. Attitudes 

One of the most prominent dilenwas in evaluation is the validity of measures 
on the independent and dependent variables. Some considerations- in this area 
' are : 

. ^ 1. The measures should at least be reliable. 

2. When normed scores are used^the norming group should have similar 
* characteristics to the children being measured (i.e., language, 

ERiC , I o 1^ 



\ - 
cu.lture, socioeconomic status, etc.), and measurement should 
be made on the treatment. groups at the same time during the 
school year that the norm group was tested. 

- 3. If the instruments have' parallel forms in English and a second 
language, the forms should have been adapted and not just trans- 
lated. If translated, they should have been field tested. 

4, The language used to obtain measures should only include the 
language the children use at their developmental level. 

5, The measures should be culturally sensitive. 

6, Administration and'scoring should, be straightforward and 
objective, ' , 

Some unique problems of validity occur when pretest and posttest differences 
or rnore complex measures of change are used. These problems are aptly described 
by Lord (1963). Namely, these problems include the regression effect paradox, 
the reliability of estimated change, the effect of change on group heterogeneity, 
spurious correlation between change and some other variable. 

Another assault on validity of longitudinal studies occurs by the mere* fact 
that the time period over which measures are made is greater than in the one- 
shot design. Thus, changes can occur in tbe time-dependent contextual variables. 
For this reason, these variables should be measured on more than one occasion 
along with the dependent measures. Ideally, such measures should be made con- ' 
currently. 

Data Collection and Management in Evaluation Pro.iects 
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One of the most straightforward tasks o^ evaluation studies in general and 
particularly longitudinal studies is data collection and management. Yet, this 
talK IS usually the one that requires the most effort and is- usually poorly done, 
resulting in invalid evaluations. Competency of data gatherers and their managers 
IS mandatory. Some considerations in collecting and managing the data are: 

1. Data should 'l^e maintained on a- per student basis. 

2. AH students should be given one and only one unique identifi- 
cation number and this should be recorded on all information 
collected. 

3. A computerized data base should be developed where possible to 
organize and maintain the data. 

' ^ ^'"'^ 

4. Sorting of students by Informative identification numbers can 
provide an easy to U5e directory. 

5. Meaningful identification numbers can be produced by using 
Indicators of student characteristics such as the school, 
program, and section he is enrolled in, his birth year, year 
he entered the program, and grade he entered the program, etc. 



6* Computer routines for validity checking should be incorporated 
into the data management system, 

7, Simple editing, sorting, and merging routines shouVd be setup 
for production Use. 

8* All data colection and management activities should be the 
responsibility of one person- This will avoid confusion and 
misinformation that normally occur when many data gathering 
activities, are undertaken. 

9, Many of the data management duties require the technical 
expertise of a good computer services staff which has some 
knowledge of statistical software that may be applied for 
evaluation • 

Longitudinal designs are more susceptible to missing data problems 
through attrition and other reasons* All efforts should be made to avoid 
missing data. Where such problems do occur, there is^ very little elegant 
recourse. Some possible compensation steps which are not without bias are: 

- Exclude records that have missing data. ^ 

- Estimate missing data from regression ^equations 
developed from -available data (--In this case, the ' 
95% confidence- intervals^ could be used rather than 
the point estimates and appropriate maximum likeli- 
hood regression techniques could be applied' to han- 
dle the mixed data forms, that is ^point and interval 
values. 

- Scale down the evaluation to include only that set 
of variable's for which complete data are available." 

As stated, each of these approaches are biased. The degree to which they can 
he applied depend on the data at hand. 

Another primary dilemma of longitudinal evaluation and specifically bilin- 
gual evaluation is the comparability of measurement instruments over time. 
Tests in bilingual education that represent a continuum over various education- 
al and developmental levels are scarce. These tests are usually not normed * 
and thus one level is not related to others. The concept of grade equivalents 
has not been applied to bilingual measures. Thus measures of student progress 
over time may have to be developed before meaningful trend analysis can be 
'performed. This is a major problem since a great deal of time, effort, and 
expertise must be employed to develop tests that measure the same concepts 'at 
various levels. The authors have no good suggestions for handling this problem 
other than to start from scratch. 



Conclusion 



There is a great need to have good evaluation studies in regard to bilin- 
gual education programs. We want to know what the reality in these programs is 
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today; specifically we want to know what their stren^ies'lit?^ weaknesses are. 
This way, programs can be better implementedv 

» » 

There are many issues which should be taken into account when designing 
evaluation studies in bilingual settings. This paper has ttv[ed to make the 
reader conscious of two salient issues related to evaluation ^ai-M lingual programs: 
1) there Is a need to identify an evaluation team with diversified interests 
and expertise and, 2) the evaluator must be aware of the restrictions in proper- 
ly identifying a control group,^- 

Under the constraints caifsed by the nature of bilingual education^ pro- 
grams, it seems that the time series design , where a random sample on the 
whole population is tasted at "different points of time, is the best alterT 
native to strong designs used in evaluation studies. Since this design is . 
a weak one and there is not a control group involved, the need for a good , 
comprehensive documentation of the^^implementation of the program is , crucial • 
This will help evaluators clarify and explain whether the findings ^rom the 
evaluation study can be intexpreted-as due to the program and not related to 
other variables unrel*atecl to the program. 

The author reconinends that independent variables used in the evaluation 
studies should be chosen in relation to the particular ^eeds and characteristics 
of the setting being studied. It is recommended, though, that comparisons among 
children attending different program models (i*e. half-day programs vs* self- 
contained) would not be made at this point of time* It is not possible to com 
pare children attending one program model against another because the" program 
attended may not be the same even within a school year. Program models, too, 
have been very vague and ill -defined up to now. Although two" programs may be 
called the same name (i.e. half-day), they may provide .completely different 
treatments and as such they can not or n^y not be compared as to their effect, 
on student achievement variables. 

Finally, the need for. a data management system is stressed, especially 
in longitudinal studies. This is a component of evaluation which is aften 
overlooked but which can produce problems when analysing data and drawing 
conclusions with findings. 

- ■ . / 

' ■ I 
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